Goto

Collaborating Authors

 alpha 0


Review for NeurIPS paper: MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing Systems

Additional Feedback: Most of recent offline RL algorithms rely on policy regularization where the optimizing policy is prevented from deviating too much from the data-logging policy. Differently, MOReL does not directly rely on the data-logging policy but exploits pessimism to a model-based approach, providing another good direction for offline RL. However, it would be more natural to penalize more to more uncertain states. For example, one classical model-based RL algorithm (MBIE-EB) constructs an optimistic MDP that rewarding the uncertain regions by the bonus proportional to the 1/sqrt(N(s,a)) where N(s,a) is the visitation count. In contrast, but similarly to MBIE-EB, we may consider a pessimistic MDP that penalizes the uncertain regions by the penalty proportional to the 1/sqrt(N(s,a)). How is it justified to use alpha greater than zero for USAD? - It would be great to see how sensitive the performance of the algorithm with respect to kappa in the reward penalty and threshold in USAD.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Update after rebuttal I thank the authors for a comprehensive rebuttal and extra experiments. It has addressed most of my concerns, and I have updated my score. The authors should make sure to properly tone down the claims about improved training for LDA (vs. It seems to me that we do not really understand very well what is happening in these models at this stage; this perplexity experiment is just scratching the surface (and should be presented as such). I am also a bit puzzled by the use of alpha 1.001 (vs.


Review for NeurIPS paper: Adapting Neural Architectures Between Domains

Neural Information Processing Systems

In AdaptNAS, a domain discriminator is used to approximate the domain discrepancy, which might introduce a certain amount of computation overhead. In particular, [23] performs consistently better than the proposed method while only searching in CIFAR-10. More importantly, it seems like there is no ablation study between using L_d (domain adaptation loss) or not. This makes it difficult to identify whether the performance is caused by using training data from both domains (L_S, L_T) or by the domain adaptation loss (L_d), which is the main contribution. However, it performs even better than most of the other settings where L_d presents (alpha 0).


Thompson Sampling and Approximate Inference

Neural Information Processing Systems

We study the effects of approximate inference on the performance of Thompson sampling in the k -armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in \alpha -divergence) can lead to poor performance (linear regret) due to under-exploration (for \alpha 1) or over-exploration (for \alpha 0) by the approximation. While for \alpha 0 this is unavoidable, for \alpha \leq 0 the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.


Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Neural Information Processing Systems

Given \alpha,\epsilon, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 \alpha)\,L *_\gamma \epsilon, where L *_\gamma is the optimal \gamma -margin error rate. For \alpha 1/\gamma, polynomial time and sample complexity is achievable using the hinge-loss. An immediate question, which this paper tackles, is what is achievable if \alpha \in (0,1/\gamma) . We derive positive results interpolating between the polynomial time for \alpha 1/\gamma and the exponential time for \alpha 0 . In particular, we show that there are cases in which \alpha o(1/\gamma) but the problem is still solvable in polynomial time.


Speeding up Heterogeneous Federated Learning with Sequentially Trained Superclients

Zaccone, Riccardo, Rizzardi, Andrea, Caldarola, Debora, Ciccone, Marco, Caputo, Barbara

arXiv.org Artificial Intelligence

Abstract--Federated Learning (FL) allows training machine learning models in privacy-constrained scenarios by enabling the cooperation of edge devices without requiring local data sharing. This approach raises several challenges due to the different statistical distribution of the local datasets and the clients' computational heterogeneity. As a solution, we propose FedSeq, a novel framework leveraging the sequential training of subgroups of heterogeneous clients, i.e. superclients, to emulate the centralized paradigm in a privacycompliant In this work, we tackle the problems of i) non identical class distribution, meaning that for a given pair instance-label In 2017, McMahan et al. [25] introduced Federated Learning (x, y) P Inspired by the differences with the standard centralized In FL, the clients are involved in an iterative two-step process training procedure, which bounds any FL algorithm, we introduce over several communication rounds: (i) independent training Federated Learning via Sequential Superclients Training on edge devices on local datasets, and (ii) aggregation of the (FedSeq), a novel algorithm that leverages sequential training updated models into a shared global one on the server-side. This approach is usually effective in homogeneous scenarios, We simulate the presence of homogeneous and larger datasets but fails to reach comparable performance against non-i.i.d. In particular, it has been shown that the non-iidness of distributions are grouped, forming a superclient based local datasets leads to unstable and slow convergence [23], on a dissimilarity metric.


Mining Feature Relationships in Data

Lensen, Andrew

arXiv.org Artificial Intelligence

When faced with a new dataset, most practitioners begin by performing exploratory data analysis to discover interesting patterns and characteristics within data. Techniques such as association rule mining are commonly applied to uncover relationships between features (attributes) of the data. However, association rules are primarily designed for use on binary or categorical data, due to their use of rule-based machine learning. A large proportion of real-world data is continuous in nature, and discretisation of such data leads to inaccurate and less informative association rules. In this paper, we propose an alternative approach called feature relationship mining (FRM), which uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data. To the best of our knowledge, our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features. Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships which can be easily interpreted and which provide clear and non-trivial insight into data.


Non-Stationary Delayed Bandits with Intermediate Observations

Vernade, Claire, Gyorgy, Andras, Mann, Timothy

arXiv.org Machine Learning

Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is well-understood in stationary environments, the problem becomes much more challenging when the environment changes. In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete. However, the arising issues can be addressed if intermediate signals are available without delay, such that given those signals, the long-term behavior of the system is stationary. To model this situation, we introduce the problem of stochastic, non-stationary, delayed bandits with intermediate observations. We develop a computationally efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance. Experimental results demonstrate that our method is able to learn in non-stationary delayed environments where existing methods fail.


On Mutual Information in Contrastive Learning for Visual Representations

Wu, Mike, Zhuang, Chengxu, Mosse, Milan, Yamins, Daniel, Goodman, Noah

arXiv.org Machine Learning

In recent years, several unsupervised, "contrastive" learning algorithms in vision have been shown to learn representations that perform remarkably well on transfer tasks. We show that this family of algorithms maximizes a lower bound on the mutual information between two or more "views" of an image where typical views come from a composition of image augmentations. Our bound generalizes the InfoNCE objective to support negative sampling from a restricted region of "difficult" contrasts. We find that the choice of negative samples and views are critical to the success of these algorithms. Reformulating previous learning objectives in terms of mutual information also simplifies and stabilizes them. In practice, our new objectives yield representations that outperform those learned with previous approaches for transfer to classification, bounding box detection, instance segmentation, and keypoint detection. The mutual information framework provides a unifying comparison of approaches to contrastive learning and uncovers the choices that impact representation learning.


Learning LWF Chain Graphs: A Markov Blanket Discovery Approach

Javidian, Mohammad Ali, Valtorta, Marco, Jamshidi, Pooyan

arXiv.org Artificial Intelligence

This paper provides a graphical characterization of Markov blankets in chain graphs (CGs) under the Lauritzen-Wermuth-Frydenberg (LWF) interpretation. The characterization is different from the well-known one for Bayesian networks and generalizes it. We provide a novel scalable and sound algorithm for Markov blanket discovery in LWF CGs and prove that the Grow-Shrink algorithm, the IAMB algorithm, and its variants are still correct for Markov blanket discovery in LWF CGs under the same assumptions as for Bayesian networks. We provide a sound and scalable constraint-based framework for learning the structure of LWF CGs from faithful causally sufficient data and prove its correctness when the Markov blanket discovery algorithms in this paper are used. Our proposed algorithms compare positively/competitively against the state-of-the-art LCD (Learn Chain graphs via Decomposition) algorithm, depending on the algorithm that is used for Markov blanket discovery. Our proposed algorithms make a broad range of inference/learning problems computationally tractable and more reliable because they exploit locality.